Convex Optimization 2015.11.25
Convex Optimization 2015.11.25
\(\lVert y \rVert _* = \sup \{ x^T y : \lVert x \rVert \le 1 \}\)
\(\implies x^T y \le \lVert x \rVert \cdot \lVert y \rVert _*\)
(because \(\frac{x^T}{\lVert x \rVert} y \le \lVert y \rVert _*\))
Want inequality of type: \(x^T y \le f(x) + "f^*(y)"\) for "general" \(f\) (Fenchel's Inequality)
- Definition: For \(f : \mathbb{R}^n \to \mathbb{R}\), the conjugate \(f^*\) of \(f\) is defined by \(f^*(y) = \underset{x}{\sup} (x^T y - f(x))\)
with $dom , f^* = $ set of \(y\)'s for which \(\sup\) is \(\lt \infty\).
- Example:
- \(f(x) = a^T x + b (x \in \mathbb{R}^n)\)
\(f^*(y) = \underset{x}{\sup} x^T y - a^T x - b = \begin{cases} \infty & \text{if } y \neq a \\ -b & \text{if } y = a \end{cases}\)
- \(f(x) = -\log x (x \gt 0)\)
\((xy + \log x)' = y + \frac{1}{x} = 0 \implies x = -\frac{1}{y}\)
\(f^*(y) = \underset{x \gt 0}{\sup} x^T y + \log x = \begin{cases} \infty & \text{if } y \ge 0 \\ -\log(-y) - 1 & \text{if } y \lt 0 \end{cases}\)
- \(f(x) = e^x (x \in \mathbb{R})\)
\((xy - e^x)' = y - e^x = 0 \implies x = \log y\)
\(f^*(y) = \underset{x}{\sup} x^T y - e^x = \begin{cases} \infty & \text{if } y \lt 0 \\ y\log y - y & \text{if } y \ge 0 \end{cases}\)
- \(f(x) = x \log x (x \ge 0)\)
\((xy - x \log x)' = y - \log x - 1 = 0 \implies x = e^{y - 1}\)
\(f^*(y) = \underset{x \ge 0}{\sup} x^T y - x \log x = y e^{y - 1} - (y - 1) e^{y - 1} = e^{y - 1}\)
- \(f(x) = \frac{1}{2} x^T Qx\) with \(Q \in S_{++}^n\)
\(f^*(y) = \underset{x}{\sup} x^T y - \frac{1}{2} x^T Qx = y^T Q^{-1} y - \frac{1}{2} y^T Q^{-1} y = \frac{1}{2} y^T Q^{-1} y\)
(\(\underset{x}{\inf} x^T Ax + x^T b \implies \text{best} x = -\frac{1}{2} A^{-1} b\))
So \(x = Q^{-1} y\)
\(\implies x^T y \le \frac{1}{2} x^T Qx + \frac{1}{2} y^T Q^{-1} y\), for all \(Q \succ 0\)
- \(f(x) = \log \left( \sum _{i = 1}^n e^{x_i} \right)\)
\(f*(y) = \underset{x}{\sup} x^T y - \log \left( \sum _{i = 1}^n e^{x_i} \right)\)
\(\left(xy - \log \left( \sum _{i = 1}^n e^{x_i} \right) \right)' = y - \frac{e^{x_i}}{\sum _{i = 1}^n e^{x_i}} = 0\)
\(\implies y_i = \frac{e^{x_i}}{\sum _{i = 1}^n e^{x_i}}, y \succeq 0, 1^T y = 1\)
assume for simplicity, \(y \succ 0\)
put \(x_i = \log (y_i)\), then \(\sum e^{x_i} = 1^T y = 1\) and optimality conditions hold
then \(f^*(y) = \sum _{i = 1}^n y_i \log (y_i) - \log (1^T y) = \sum _{i = 1}^n y_i \log (y_i)\)
- \(f(x) = \lVert x \rVert\)
$f^*(y) = x^T y - x =
\[\begin{cases} 0 & \text{if } \lVert y \rVert _* \le 1 \\ \infty & \text{if } \lVert y \rVert _* \gt 1 \end{cases}\]$
\(x^T y - \lVert x \rVert \le \lVert x \rVert \cdot \lVert y \rVert _* - \lVert x \rVert = \lVert x \rVert (\lVert y \rVert _* - 1) \le 0\) if \(\lVert y \rVert _* - 1 \le 0\)
- \(f(x) = \frac{1}{2} \lVert x \rVert ^2\)
\(f^*(y) = \underset{x}{\sup} x^T y - \frac{1}{2} \lVert x \rVert ^2 = \frac{1}{2} \lVert y \rVert _*^2\)
\(x^T y - \frac{1}{2} \lVert x \rVert ^2 \le \lVert x \rVert \cdot \lVert y \rVert _* - \frac{1}{2} \lVert x \rVert ^2 \le \frac{1}{2} \lVert y \rVert _*^2\) (\(\lVert x \rVert = \lVert y \rVert_*\))
\(\implies x^T y \le \frac{1}{2} \lVert x \rVert ^2 + \frac{1}{2} \lVert y \rVert _*^2\)
- Proof of general hyperplane seperation:
Let \(C \subseteq \mathbb{R}^n\) be a convex set, \(H \subseteq \mathbb{R}\) be the affine subspace of smallest dimention containing \(C\), we write \(C_{\varepsilon} = \{ x : B_{\varepsilon} (x) \bigcap H \subseteq C \}\)
then \(C_{\varepsilon} \subseteq "\text{relint} (C)" = \underset{\varepsilon \gt 0}{\bigcup} C_{\varepsilon}\). (relint: relative interior)
(\(C \subseteq \overline{\text{relint}(C)}\), \(C\) is a subset of closure of \(\text{relint}(C)\))
Let \(C, D\) be disjoint convex sets. Then for every \(\varepsilon \gt 0\) the sets \(A_{\varepsilon} = \overline{C_{\varepsilon}} \bigcap B_{\frac{1}{\varepsilon}}(0)\), \(\overline{D}\) are closed disjoint convex sets with \(\overline{C_{\varepsilon}} \bigcap B_{\frac{1}{\varepsilon}}(0)\) bounded, and \(\text{dist}(A_{\varepsilon}, \overline{D}) \ge \varepsilon \gt 0\).
So \(\exists A_{\varepsilon} \in \mathbb{R}^n\), \(a_{\varepsilon} \neq 0\), \(b_{\varepsilon} \in \mathbb{R}\) s.t. \((a_{\varepsilon}, b_{\varepsilon})\) define a seperating hyperplane for \(A_{\varepsilon}, \overline{D}\).
\(a_{\varepsilon}^T x \le b_{\varepsilon} \; \forall x \in A_{\varepsilon}\), \(a_{\varepsilon}^T x \ge b_{\varepsilon} \; \forall x \in \overline{D}\)
WLOG \(\lVert a_{\varepsilon} \rVert = 1\)
The sequence \((\vec{a}_\frac{1}{n})_{n = 1}^{\infty}\) is a sequence of unit vectors and so has a convergent subsequence, say WLOG convergent to \(a_0 \in \mathbb{R}^n\).
can assume sequence \(b_{\frac{1}{n}}\) is bonded (or else one of the sets \(C, D\) is empty)
and so also convergent to some value \(b_0 \in \mathbb{R}\).
Want to show \((a_0, b_0)\) is SH for \(C, D\), i.e., that
\(a_0^T x \le b_0 \; \forall x \in C, \, a_0^T x \ge b_0 \; \forall x \in D\)
(Assume \(C\) is not a point, proof like above; then assume D is not a point, switch \(C, D\).
If \(C, D\) are points, obious true.)
Log-convexity and log-concavity
Definition: \(f : \mathbb{R}^n \to \mathbb{R}_{\gt 0}\) is log-convex (log-concave) if \(\log (f)\) is convex (concave).
Convexity:
\(\log (f(\theta x + (1 - \theta) y)) \le \theta \log (f(x)) + (1 - \theta) \log (f(y)) = \log (f(x)^{\theta} f(y)^{1 - \theta})\)
\(\iff f(\theta x + (1 - \theta)y) \le f(x)^{\theta} f(y)^{1 - \theta}\)
- Remark 2: log-convex \(\implies\) convex, \(f(x) = e^{\log f(x)}\), (composition function, QED)
concave \(\implies\) log-concave